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^^ ' The transition density of a diffusion process does not admit an 

explicit expression in general, which prevents the full maximum like- 
lihood estimation (MLE) based on discretely observed sample paths. 
Ai't-Sahalia [J. Finance 54 (1999) 1361-1395; Econometrica 70 (2002) 
223-262] proposed asymptotic expansions to the transition densi- 
E-H ■ ties of diffusion processes, which lead to an approximate maximum 

{/y ' likelihood estimation (AMLE) for parameters. Built on Ait-Sahalia's 

[Econometrica 70 (2002) 223-262; Ann. Statist. 36 (2008) 906-937] 
proposal and analysis on the AMLE, we establish the consistency 
Cu | and convergence rate of the AMLE, which reveal the roles played by 

the number of terms used in the asymptotic density expansions and 
the sampling interval between successive observations. We find con- 
ditions under which the AMLE has the same asymptotic distribution 
as that of the full MLE. A first order approximation to the Fisher 
information matrix is proposed. 
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1. Introduction. Continuous-time diffusion processes denned by stochas- 
tic differential equations [Karatzas and Shreve (1991), 0ksendal (2000), 
Protter (2004)] are the basic stochastic modeling tools in the modern fi- 
nancial theory and applications. Diffusion models are commonly employed 
to describe the price dynamics of a financial asset or a portfolio of assets. 
An eminent application is in deriving the price of a derivative contract on 
an asset or a group of assets. The celebrated Black-Scholes-Merton option 
pricing formula [Black and Scholes (1973), Merton (1973)] was obtained 
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& . by assuming that the underlying asset followed a geometric Brownian mo- 

tion such that the log price process of the underlying asset followed an 
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Ornstein-Uhlenbeck diffusion process. The widely used Vasicek (1977) and 
Cox, Ingersoll and Ross (1985) pricing formulas for the zero coupon bond 
were developed based on two specific mean-reverting diffusion processes with 
a constant or the square root [Feller (1952)] diffusion functions, respectively. 
Other pricing formulas have also been developed for assets defined by other 
processes; see Bakshi, Cao and Chen (1997) and Dumas, Fleming and Wha- 
ley (1998). In the implementations of the aforementioned pricing formula, 
the parameters of the diffusion processes which describe the underlying as- 
sets dynamics have to be estimated based on empirical observations. Sun- 
daresan (2000) gave a comprehensive survey on the financial applications 
of continuous-time stochastic models which were largely the diffusion pro- 
cesses. Fan (2005) provided an overview on nonparametric estimation for 
diffusion processes. Other related works include Bibby and S0rensen (1995), 
Wang (2002), Fan and Zhang (2003), Fan and Wang (2007), Mykland and 
Zhang (2009) and Ai't-Sahalia, Mykland and Zhang (2011). 

There are several challenges to be faced when estimating parameters of 
diffusion processes. One challenge is that despite being continuous-time mod- 
els, the processes are only observed at discrete time points rather than ob- 
served continuously over time. The discrete observations prevent the use of 
the relatively straightforward likelihood expressions [Prakasa Rao (1999)] 
available for continuously observed diffusion processes. Another challenge is 
that despite the fact that the diffusion processes are Markovian, their tran- 
sition densities from one time point to the next do not have finite analytic 
expressions, except for only a few specific processes. This means that the effi- 
cient maximum likelihood estimation (MLE) cannot be readily implemented 
for most of these processes. 

In ground-breaking works, Ai't-Sahalia (1999, 2002) established series ex- 
pansions to approximate the transition densities of univariate diffusion pro- 
cesses. Similar expansions have been proposed for multivariate processes 
in Ai't-Sahalia (2008). These density approximations, as advocated by Ai't- 
Sahalia, are then employed to form approximate likelihood functions, which 
are maximized to obtain the approximate maximum likelihood estimators 
(AMLEs). Ai't-Sahalia (2002, 2008) demonstrated that the approximate like- 
lihood converges to the true likelihood as the number of terms in the series 
expansions goes to infinity. He also provided some results on the consistency 
of the AMLEs. Numerical evaluations of the transition density approxima- 
tions as conducted in Ai't-Sahalia (1999), Stramer and Yan (2007a, 2007b) 
and others, have shown good performance in the numerical approximation of 
the underlying transition densities. The approach has opened a very accessi- 
ble route for obtaining parameter estimators for diffusion processes, and for 
estimating other quantities which are functions of the transition density, as 
commonly encountered in finance. Indeed, Ai't-Sahalia and Kimmel (2007, 
2010) demonstrated two such applications in stochastic volatility models 
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and the affine term structure models, respectively. Tang and Chen (2009) 
provided some results on the AMLE based on the one-term expansion for 
the mean-reverting processes. They revealed that there was an extra leading 
order bias term in the AMLE due to the density approximation. 

Although the above-mentioned results on the transition density approx- 
imation and the AMLE had been provided, there are some key questions 
that remain to be addressed. One is on the consistency of the AMLE. While 
A'it-Sahalia (2002, 2008) contained some results on consistency, there is more 
to be explored. There are two key ingredients in Ai't-Sahalia's density ap- 
proximation. One is J, the number of terms used in the approximation, and 
the other is 5, the length of the sampling interval between successive obser- 
vations. In this paper, we study explicitly the roles played by J and 5 on 
the consistency of the AMLE, and quantify their roles on the convergence 
rate. Another question is under what conditions on J and 5, does the AMLE 
have the same asymptotic distribution as the full MLE. Here, we consider 
two regimes: (i) 5 is fixed, and J — > oo; (ii) J is fixed, but 5 — > 0, represent- 
ing two views of asymptotics. In the case of 5 — > 0, it is found that J > 2 
is necessary to ensure the AMLE having the same asymptotic normality 
as the MLE. Like the transition density, the Fisher information matrix, the 
quantity that defines the efficiency of the full MLE, is unknown analytically; 
even the underlying transition density is known. We show in this paper that 
an approximation to the Fisher information matrix can be obtained based 
on the one-term density approximation. 

The paper is organized as follows. In Section 2, we outline the transition 
density approximations of A'it-Sahalia (1999, 2002). Some preliminary anal- 
ysis is needed for studying the AMLE is presented in Section 3. Section 4 
establishes the consistency and convergence rates of the AMLE. Asymptotic 
normality of the AMLE and its equivalence to the full MLE are addressed in 
Section 5. Section 6 discusses the approximation for the Fisher information 
matrix. Simulation results are reported in Section 7. Technical conditions 
and details of proofs are relegated to the Appendix. 

2. Transition density approximation. Consider a univariate diffusion pro- 
cess (A"i)t>o defined by a stochastic differential equation 

(2.1) dX t = {i(X t ;0)dt + a(X t ;9)dB t , 

where \jl and a are, respectively, the drift and diffusion functions and Bt is 
the standard Brownian motion. Both the drift and diffusion functions are 
known except for an unknown parameter vector 6 taking values in a set 
0C]R d . 

Given a sampling interval 5 > 0, let fx(x\xo, 5; 6) be the transition density 
of Xt+$ given Xt = xq for (xq,x) £ X x X, where X is the domain of Xt. 
Despite the parametric forms of the drift and the diffusion functions that 
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are available in (2.1), a closed-form expression for fx (x\xo,6;9) is not gen- 
erally available for most of the processes. In most cases, the density is only 
known to satisfy the Kolmogorov backward and forward partial differen- 
tial equations. In ground-breaking works, A'it-Sahalia (1999, 2002) proposed 
asymptotic expansions to approximate the transition density. 

The approach of A'it-Sahalia is the following. He first transformed Xf to 
a diffusion process with unit diffusion function by 

C Xt chi 

(2.2) Y t = 1 {X t r 



a(u;0y 
which satisfies dY t = \iy (Yt; 9) dt + dB t , where 

i a \ / u (7~ 1 (y;0);0) l da x 

Let fY(y\yo,S;9) be the transition density of Y t+ s given Y t = yo- The two 
density functions are related according to 

(2.3) f x {xt\x t -iA o) = a~\x t] e) • M7(*t; 0)l7(*t-i; 0), <*; o). 

To ensure convergence of the expansions, A'it-Sahalia standardized Y t+ g 
by Z t +s = 5~ 1 ' 2 {Yt+s — Uo)- Let fz(z\yo,5;9) denote the conditional density 
of Z t+ s given Zf = 0, which is related to fy by 

f z (z\y ,S; 9) = 5 1 ' 2 f Y (5 l l 2 z + y \y , 5; 9). 
Let {Hj(z)} c ? 1 be the Hermite polynomials 

u n a-1( \ dJ 4>(z) 
H j {z) = <i> (z)—£j-, 

which are orthogonal with respect to the standard normal density </>, namely 
J Hj(z)Hk{z)cj){x) dx = if j ^ k. A formal Hermite orthogonal series expan- 
sion to the density fz{z\yo,S;6) is 

oo 

(2.4) fi(z\y o ,5;9) = 0(z)Y,Vj(yo,S;O)H J (z), 

where the coefficients 



VjiVoA 9) = (jiy 1 j H 3 {z)f z (z\y , 5; 9) dz 

= (jiy^iHjiS-^iYt+s - y ))\Y t = y ; 9]. 

The last conditional expectation has no analytic expression in general, al- 
though it may be simulated using the method proposed in Beskos et al. 
(2006). A'it-Sahalia proposed Taylor expansions for this conditional expec- 
tation with respect to the sampling interval 5 based on the infinitesimal 
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generator of Yj. For twice continuously differentiable function g, the in- 
finitesimal generator of Yj is 

(2.5) ^ (y)=My;(?) g + 10. 

A ET-term Taylor series expansion to ¥,[Hj(S~ 1 ' 2 (Y t+ ^ — yo))\Y t = yo; 9] is 

E[H j (S- 1 / 2 (Y t+s -y ))\Y t = y ;9] 

6 k 



(2.6) =^ A o H ^ 1/2 (y-yo))\ v - 

k=0 

6 k+i 



~- yo kl 
+ E[A k + 1 H j (5- 1 J 2 (Y t+s * -y ))\Y t = y ;9]- 



Substituting (2.6) to the orthogonal expansion (2.4) followed by gathering 
terms according to the powers of 5, a J-term expansion to the transition 
density fv{y^\yo]9) is 



/^ ) (y|y o ,5^) = ^ 1/2 0(^^)exp(y'% y (n;^^^ Cj (y|yo 



Cj[y\yo\9] 
j=o 

where co(y|yo; 9) = 1 and for j > 1, 

cj(y\yo;0)=j(y-yoy j 

ry 
x / (w-yoY" 1 

xlx Y (w;9)c j . 1 (w\y ;9) + - gj } dw - 

Here Ay (y; 0) = -{/4(y; 6) + 5//y (y; 6>)/Sy}/2. 

Transforming back from y to x via (2.2) and (2.3), the J-term expansion 
to fx(x\x ,8;9) is 



f { x J \x\x ,5;9) 

= a-\x;9)S^J ^ 9) -^ 



8 1 / 2 

j 



xexp l / r~m dw^^^'W^'^lT^'^'^T" 

lJx CT(U,0) J ^._ q J. 

Although it employs the Hermite polynomials and has the Gaussian density 
as the leading term as an Edgeworth expansion does, the transition density 
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expansion is not an Edgeworth expansion. This is because the latter is for 
density functions of statistics admitting the central limit theorem, which 
differs from the current context of expanding the transition density. A'lt- 
Sahalia (2002) demonstrated that as J — >oo, 

(2.7) f x J) (x\x ,5;9)^fx(x\x ,5;9) 

uniformly with respect to 9 G G and xq over compact subsets of X. The 
convergence is also uniform with respect to x over subsets of X depending 
on the property of a(x;6). 
Define 

A 1 (x\x , 8-9) = - log{a(x; 6)} - ^{ 7 (x; 9) - 7 (x ; 9)} 2 , 



A2{x\XQ,d;0) = / -; — — - du 



•'■'ii 



25 
a(u;9) 



and 



A 3 (x\x , 6; 9) = log j ]T Cj ( 7 (x; 0)| 7 (x o ; 0); 9)P/j\ 1 . 

If X^^=ol c j(y|yO)^;^)|^ J /j! < o° on 3^ x y with probability one, where y is 
the domain of Y t , we can define yl3(x|xo,<5;#) = log{Y*, < jLo c j(y\yoiO)$ 3 /jfy- 
Then the result in (2.7) implies that 

logfx(x\x o ,5;0) 

(2.8) = -logV2~7r6 + Ai(x\x ,5;9) + A 2 (x\x ,5;9) 

+ A 3 (x\x ,5;9). 

Expression (2.8) is the starting point for our analysis. 

Given a set of discrete observations {X t s}f =1 with equal sampling length 5 
of the diffusion process (Xt)t>o, to simplify notations, we write Xt for X t g 
and hide 5 in the expressions for the transition density fx and its approx- 
imations. At the same time, we use / and f( J ' to express fx and f x , 
respectively. Based on the J-term expansion to the true transition density, 
the J-term approximate log-likelihood function given in A'it-Sahalia (2002) 
is 



i J > s (9) = -nlogV2^ + Y,MXt\X t ^5-,9) 

t=i 

n n 

+ Y J MXt\X t -i,5;9)+Y J MXt\X t -i,S;9). 



t=i t=x 
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Let 9 n g = argmax^ge t n g(6) be the approximate MLE (AMLE) and § n ,s 
be the true MLE that maximizes the full likelihood 



4,«5W = ^log/(X i |X t _ 1 ^;0). 



t=i 

To keep the notation simple, we write 9 n = 9 n g and 9 n = 9 n ,s by suppress- 
ing 5 in subscripts. 

3. Preliminaries. Under regular circumstances as assumed by condi- 
tion (A.2)(ii) in the Appendix, the full MLE 9 n and the J-term approxi- 
mate MLE n satisfy their respective likelihood score equations so that 

n n 

(3.1) ^V 9 log/(A t |A t _ 1 ,5;4) = J]V e log/( J )(X 4 |X 4 „ 1 , ( 5;^ J ))=0. 
4=1 4=1 

Subtracting J2t=i^8^°& f \Xt\X t _i,6;9o) from both sides of (3.1), 



^V e log/( J )(A t |A t _ 1 ,5;0y))-^V e log/( J )(A t |A t _ 1 ,«5;0o) 

4=1 4=1 

it 

(3.2) =J2Ve[MXt\Xt-i,5;e )-A 3 (Xt\X t - 1 ,S;e )] 

4=1 

n n 

+ ^V e log/(X t |AVi,Mn)-]Tv e log/(X t |AV 1 ;#o). 
4=1 4=1 

Carrying out Taylor expansions on both sides of (3.2), we can get 
1 n 

-Y^v 2 ee \ogf( J Xx t \x t - x ,5-M ■ (ep - e Q ) 

i i n 

+ -[E d ®(6^ - 9 )'] ■ -J2 V 3 90e logf^(X t \X t . u 5;6) ■ (0 n J) - O ) 
z n t=i 

1 n 

(3.3) =-Y,V9[MXt\Xt^i,S;0o)-MXt\Xt-i,6;e o )] 

t=\ 

1 n 
+ -Y. v2 ee l °zf( x t\ x t-i^-M-{8n-0 Q ) 

4=1 
1 1 " 

+ -[E d ®{6 n - Bo)'] ■ - VV^log/(X t |X t _i,5;^) 



71 — POJ 

n A — ' 

4=1 
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where Ed is the d x d identity matrix, 9 is on the joint line between 9 n 
and #o and 9 is on the joint line between 9 n and 9q. Here we define 

/dHogf(X t \X t . 1 ,5;9)/d9d9'de 1 

V 3 eee logf(X t \X t _ u 5;0):=l : 

\d 3 logf(X t \X t _ 1 ,5;9)/d9d9>, 

which is a d 2 x d matrix, and \7g eg logf^ J >(Xt\Xt-i,5;9) is similarly defined. 
Furthermore, let 

n 

F n (e ,J,5)=n- 1 ^VJ e [A 3 (X t \X t - 1 ,5;e )-A 3 (X t \X t -i,d;9 )], 

t=i 

U n (9 ,J,8) =n- 1 J2^e[MXt\X t -i,d;9 ) - A 3 (X t \X t ^,S;9o)} 
t=i 

and 

n 

N n (9 ,J,5)=n- 1 J2v 2 ed logf {J Hx t \X t _ 1 ,5;0 ). 
t=i 

Then (3.3) can be written as 

N n (9 , J, 5){9 n J ^ - 9 ) + A nl (9 n J \0 ) 

(3.4) = U n (9 , J, 5) + [N n (0 , J, 5) + F n (9 , J, 5)}(9 n - ) 

+ A n 2(0 n ,9o), 

where A n i(9 n ,6q) and A n 2(^n,#o) denote the remainder terms whose ex- 
plicit expressions can be obtained by matching (3.3) with (3.4). 

Expansion (3.4) is the starting point in our studies for the consistency and 
asymptotic distribution of the AMLE. Indeed, the asymptotic properties of 
the AMLE will be evaluated under two regimes regarding J and 5. The first 
one is that 

(3.5) 5 is fixed but J— >-oo, 

which is the situation considered in Ai't-Sahalia (2002). The second regime 
allows that 

(3.6) J is fixed, 5—>Q but n<5 — >-oo, 

which is more tuned with an implementation of the density approximation 
with a fixed number of terms. 

We will first present some results which are valid for any fixed J and 5. 
Let ||A||2 = {p(A' A)} 1 ' 2 be the spectral norm of a matrix A, where p(A'A) 
denotes the largest eigen-value of A' A. The following proposition describes 
properties for the quantities that appear in (3.4). 
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PROPOSITION 1. Under conditions (A.l), (A. 3), (A. 4), (A. 6), (A. 7) given 
in the Appendix, there exists a positive constant A such that for any positive 
integer J and 5 € (0, A): 

(a) E{F n (9 ,J,5)}, E{U n (9 ,J,5)} andE{N n (9 ,J,5)} exist; 

(b) A nl (e ( n J \9 ) = o p {\\e ( n J) -e \\ 2 2 } andA n2 (e n ,e ) = o p {\\§ n -e \\l} as 

n — > oo. 

Let 1(5) = — KVoologf(Xt\Xt-i,5;$o) be the Fisher information matrix, 
which we assume is invertible in condition (A. 5). It is expected that the ex- 
pected value of N n (9o,J,5), denoted by N(9q,J,5), will converge to —1(5), 
as J — )■ oo for each fixed 5 or J being fixed but 5 — > 0. The following propo- 
sition bounds the difference between N(9q,J,5) and —1(5) for each fixed J 
and 5. 

Proposition 2. Under conditions (A.l), (A. 4), (A. 6), (A. 7) given in 
the Appendix, there exist two positive constants A and C , that are not de- 
pendent on J and 5, such that for any positive integer J and 5 € (0, A), 

\\N(9 ,J,5)+I(5)\\ 2 <C5 J+1 . 

As 1(5) is invertible for each fixed 5 > 0, N u (9q,J,S) will be invert- 
ible with probability approaching one as J — > oo for a fixed 5. However, 
if (5—7-0, the limit of the Fisher information 1(0) := \iia.s^o I (5) , as well 
as N(9o, J, 0), may be singular. This is the case for some Ornstein-Uhlenbeck 
processes as shown in Section 6. The following proposition provides another 
account on N(9q,J,5) and its deviation from —1(5), as well as the conver- 
gence of N~ 1 (9o, J, 5)U(9q, J, 5), where U(9q, J, 5) denotes the expected value 
of U n (9Q, J, 5) for each pair of fixed J and 5. 

Proposition 3. Under conditions (A.l), (A.3)-(A.7) given in the Ap- 
pendix, there exist two constants C\,Ci, that are not dependent on J and 5, 
and a constant A > such that for any positive integer J and 5 € (0, A), 

\\N- l (9 ,J,5)I(5)+E d \\ 2 <C 1 5 J and \\N~ 1 (9 ,J,5)U(9o,J,5)\\ 2 <C 2 5 J . 

The next proposition describes the convergence rate for the difference 
between the first derivatives of the full log-likelihood and the approximate 
log- likelihood. 

Proposition 4. Under conditions (A.l), (A. 4), (A. 6), (A. 7) given in 
the Appendix, there exist two finite positive constants A and C, not depen- 
dent on J and 5, such that for any J, 5 £ (0, A] and n, 

EJsuplK 1 • Ve[i n ,5(0) - i J) s (9)}\\ 2 } < C5 J+1 . 
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The following proposition together with Proposition 4 is needed to estab- 
lish the consistency of the AMLE. 

PROPOSITION 5. Under conditions (A.l), (A. 3), (A. 4), (A. 6), (A. 7) given 
in the Appendix, there exists a constant A > such that 



sup 



1 n 
-J2Velogf(X t \X t ^,6;e)-EVe\ogf(X t \X t „i,6;e) 



n 
4=1 



Ao 



for (i) 5 G (0, A] being fixed, n — > oo, or (ii) n — > oo, 5 — > &ui n<5 — > oo. 

As the full MLE 9 n is a key bridge for the AMLE, we report in the 
following proposition the asymptotic normality of the MLE which covers 
both cases of fixed 5 and diminishing 5 case. 

Proposition 6. Under conditions (A.1)-(A.7) given in the Appendix, 

y/^I 1/2 (5){9 n -6 )AN(0,E d ) asn5 3 ^oo, 
where E^ is d x d identity matrix. 

The requirement of n<5 3 — > oo in the above proposition is to cover the case 
where 1(0) = lim$^oI(5) is singular, as spelled out in the proof given in the 
Appendix. If such case is ruled out, for instance, via the so-call Jacobsen 
condition [Jacobsen (2001), S0rensen (2007)], the more standard n5 — > oo is 
sufficient; see also Gobet (2002) for related results. 

4. Consistency. We consider in this section the consistency of the 

AMLE 9n and establish its convergence rate under the two asymptotic 
regimes given in (3.5) and (3.6), respectively. The two asymptotic regimes 
were also considered in A'it-Sahalia (2002, 2008). For a fixed sampling inter- 
val 5, A'it-Sahalia (2002) proved that there existed a sequence J n — > oo such 

that n n — n — * under Pg as n — > oo, where Pg is the underlying proba- 
bility measure. Based on the consistency of 9 n , we know that the consistency 
of 9n is hold. For a fixed J, A'it-Sahalia (2008) proved that there existed 
a sequence {S n } vanishing to zero such that ^JnI 1 ' 2 (5 n )(9 n ' 5 — 9q) = O p (l). 
In this paper, we will give more explicit guidelines on how to select the 
afore- mentioned sequences J n and 5 n so that the AMLE is consistent. Our 
study here begins with (3.1), which together with Propositions 4 and 5 
lead to the following result on the consistency of the AMLE under the two 
asymptotic regimes, respectively. 

Theorem 1. Under conditions (A.1)-(A.4), (A. 6), (A. 7) given in the 

Appendix, §n — 9q — > under either: (i) 5 E (0, A A A] being fixed, J — > oo 
and n — > oo ; or (ii) J being fixed, n — > oo, 5 — > but n5 — > oo. 
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By Proposition 2 and condition (A. 5), multiply A" 1 (Oq, J, 5) on both sides 
of (3.4), we have 

( n J) -O O 

(4.1) = N- l U n + N-\N n + F n )(0 n - ) - N-^Nn - N)(9^ - ) 

- N~ 1 A nl (§i J \e ) + N~ 1 A n2 (§ n ,e ). 

From this together with Proposition 4 and Theorem 1, we can establish the 
convergence rate of the AMLE. 

Theorem 2. Under conditions (A.1)-(A.7) given in the Appendix, 

a(J) a Jo p {<5 J+1 + (n(5)- 1 / 2 }, if 5 £ (0, A A A] is fixed and J -> oo; 
n °~\o p {5 J + (n5)~ 1 / 2 }, if,Jisfixed,5^0butn5 3 ^oo. 

The above theorem reveals the impacts of the sampling interval 5 and the 
number of terms J used in the density approximation on the convergence 
rate. In particular, the rate of AMLE has an extra 5 +1 or 8 term in 
addition to the standard rate (nS) -1 ' 2 of the full MLE. This extra term is 
the result of the density approximation, and its particular form suggests that 

the sampling interval 5 has to be less than 1 in order to make the AMLE On 
converge to #o- It is apparent that the higher the J is, the less impact the 

extra term has on the AMLE Q n . 

5. Asymptotic distribution. In this section, we consider the asymptotic 
distribution of the AMLE 9 n . Our investigations are organized according 
to two asymptotic regimes: (i) 5 fixed, J — > oo and (ii) J fixed, 8 — > but 

nd — > oo. 

5.1. Fixed 5, J — > oo. This is a simple case to treat. Under this setting, 
we note from Proposition 2 and condition (A. 5) that N~ 1 (9q,J,5) = 0(1) 
uniformly for any J. Utilizing the result in Theorem 2, expansion (4.1) 
becomes 

e n J ) -e = N~ x U n + n - e ) + o^n-vv- 1 / 2 + n"^- 1 + 5 2J+2 ). 

Hence, note that U n = O p (5 J+l ), 

V^i 1/2 (8)(§ n ^-e ) 

= V^I 1/2 (S)(§ n - ) + O p {5 J ~ 1 / 2 + n-^S- 1 + n l l 2 5 J+l ). 
If n5 2J+2 -> 0, then 

V^I 1/2 (8)(§^ - )AN(0,E d ). 
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Table 1 
The least approximation term selection to guarantee the 
AMLE has the same asymptotic distribution as the full 
MLE for special sampling interval S and sample size n 



s 


n = 500 


n= 1,000 


n = 2,000 


n = 4,000 


1/252 


1 


1 


1 


1 


1/52 


1 


1 


1 


1 


1/12 


1 


1 


1 


1 


1/4 


2 


2 


2 


2 


1/2 


4 


4 


5 


5 


3/4 


10 


12 


13 


14 



Therefore, the AMLE has the same asymptotic distribution as the full 
MLE 6 n . This is attained by requesting n<5 +2 — > in addition to J — >• oo. If 
n5 +2 — > c> 0, the AMLE is still asymptotically normal but would have an 
inflated variance due to the contribution from the first term involving U n . 
Apart from this, the asymptotic mean will no longer be zero. Hence, it is 
much desirable to have n5 2J+2 — > 0. The latter condition prescribes a rule on 
the selection of the J = J n (5). By choosing an e > so that 5 2J+2 = n~ l ~ £ 
for each pair of n and 6, then 

J = J n (5) = — — -logn-l>— — -logra-1. 
2 log o 2 log o 

The integer truncation of the above lower bound plus one can be used as 
a reference value for the number of terms used in the density approximation 
for each given pair of (n, 8). 

Table 1 reports such reference values of J assigned by the above formula 
for a set of (n, 5) combinations commonly encountered in empirical studies. It 
shows that for monthly frequency or less (5 < 1/12), one term approximation 
is adequate, and for 5 = 1/4, J = 2 is needed. However, there is a dramatic 
increase in J as the sampling length is larger than 1/4: demanding at least 
four terms for 6 = 1/2 (half yearly) or at least ten terms for 5 = 3/4. The 
number of terms also increases for these higher 5 values as n increases, 
although the rate of this increase is much slower than that as 5 is increased. 
The latter may be understood that for a given 5, as n increases, the chance 
of having extreme values in the tails of the transition distribution increases. 
As the density approximation is less accurate in the tails than in the main 
body of the distribution, there is a need for having more terms in the density 
approximation . 

5.2. J fixed, 5 — > but n5 — > oo. Our starting point is the expansion (4.1). 
As N n - N = O p {(n5)-^ 2 }, N~ l (N n - N) = o p (l) if n5' 3 -> oo, which is also 
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required in the asymptotic normality of the full MLE as outlined in Propo- 
sition 6. We will show in the following that n5 3 — > oo is also necessary to 
ensure AMLE sharing the same asymptotic distribution as the full MLE. It 
is understood that in order for 9 n having the same asymptotic distribution 
as 9 n , it is required that 

N- l U n ,N- l /\ nl {6 n J \0 ) and iST 1 A^&.flo) are all o p {||^ J) - O || 2 }. 
We will demonstrate in the following that the above requirements can be 
attained by n<5 3 — > oo and J > 2. Hence, under these circumstances, n has 
the same asymptotic distribution as n . Later we will demonstrate that this 
equivalence in the asymptotic distribution is quite unlikely for J = 1. Our 
analysis needs to expand (3.2) to the quadratic terms. To this end, let us 
define 

n 

M n (e ,J,S) = n- 1 Y,^ 3 eee^gf iJ) (X t \X t . 1 ,S;d ) 
t=i 

and 

it 

T n (e ,J,5) = n- 1 ^2vl ee logf(X t \X t ^,S;9 ). 
t=i 

By further expanding to quadratic terms, (4.1) can be written as 

n J) ~ OO 

= N- l U n + N-\N n + F n ){0 n - Oo) - N-\N n - N)(6 n J ^ - 9 ) 

(5.i) - \N~ l [E d ® (#y) - e )']M n (6 n J ^ - e ) 

+ \N~ l [E d ® (9 n - e )']T n (6 n - 6 ) 

-N~ 1 A nl (9 n J \e ) + N~ 1 A n2 (e n ,e ), 

where A n i(8 n ,9q) and A n 2(^ n) ^o) are remainder terms. Using the same 

method in the proof of Proposition 1, it can be shown that A n i(9 n ,9q) = 

O p {\\9 n J) - 0o|||} and A n2 (9 n ,9 ) = O p {\\9 n - O ||§}. 

In order to make 9 n have the same asymptotic distribution as 9 n , the 
two quadratic terms on the right-hand side of (5.1) have to be smaller order 

of On — 9o and 9 n — 9q, respectively, namely 

N- 1 ^ ® (9^ - 9 )'}M n (O n J) - ) = o p {\\9^ - 9 \\ 2 } 

or equivalently 

(5.2) N- 1 [E d ®(9 n ^-9o)']=o p (l) 

and 

iV- 1 ^ ® (On - 9o)')T n (9 n - 9o) = o p {\\9 n - 9 h} 
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or equivalently 

(5.3) n5 3 -)• oo 

since 9 n - 9 = O p {{n5)- 1 / 2 } and N' 1 = O^" 1 ). 

As { n J) -0 O = O p {5 J + (n<J)- 1 /2} > (5.2) requires that -5" 7 " 1 +n" 1 /2^-3/2 ^ 

0. Hence, in order to make n have the same asymptotic distribution as n , 
it is necessary to have 

(5.4) J>2 and n5 3 -> oo. 

Now we consider the case of J = 1. To ensure the remainder terms AT -1 x 
A n i(6?ri ,#o) an d A r_1 A n 2(0 n ,0o) are negligible, by a similar argument ap- 
plied above for the case of J > 2, it is also necessary to assume nS 3 — > oo. 
From Theorem 2, #„ — O = O p {5 + (tj.<5) -1 ' 2 }. To gain insight on the sit- 
uation, we need to find out the order of magnitude of the quadratic term 
in (5.1), namely the order of magnitude of 

S n = N- l [E d ® (0« - 0o)']M n (0« - Oo) - N- l [E d ® (0 n - 9 o )']T n {0 n - 9 ). 
With this notation, (5.1) can be written as 

0« -9 = N- x U n + N~\N n + F n ){9 n - 9 ) - \S n 
(5.5) 

+ o p {(n5)- l ' 2 } + O p {5 2 ). 

Define an operator between two vectors A and B, 

A*B = [E d ® A']M n B + [E d <g> B')M n A. 

By repeated substitutions, it can be shown that 

S n = iN-'KN-'Un) * (N-'Un)} + iN-^Sn) * (±S n )] 

-N- 1 [(N- 1 U n )*(lS n )] + o p (S). 

As U n = O p (5 2 ) for J = 1 and N^ 1 = O^" 1 ), it can be deduced from the 
above equation that S n = O p (6). Hence, for J = 1 if we require n5 3 — > oo, the 

quadratic term S n will contribute to the leading order of On — Oo. If we do 
not require nd 3 — > oo, then the sum of remainder terms, N' 1 A n i(0 n ,9q) + 
N~ 1 A n 2(0 n ,0o) will not be controlled. Hence, if J = 1, it is very likely that 
the asymptotic distribution of n ' will differ from that of 9 n unless U n = 
with probability one. In the rare case of U n = 0, it is possible for 9 n and n 
to share the same limiting distribution. 

Therefore, in order to guarantee that 9 n has the same asymptotic dis- 
tribution as 9 n under 5 — > 0, we need to use the AMLE based on at least 
two-term expansions, while satisfying n5 3 — > oo, which we will assume in the 
rest of this section. 
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Note that 9 { n J) -9 = O p {5 J + {n5)~ 1 / 2 }. Then, 

e^ -e = N- l u n + n - e ) 

+ O p (n^ 2 5 J - 3 / 2 ) + iV- 1 • O p (5 2J + n-'S- 1 ). 
Furthermore, 

= ^/- 1 / 2 (<5)/(<5)iV" 1 [/„ + V^i 1/2 ($)(dn - e ) + o p (5 J " 3/2 ) 



+ V^I- 1,2 {5W)N- 1 -O p {5 2J + 



n 



-lr-l> 



= V^I 1/2 (S)0 n - 0o) + O p (5 J ^ 2 + n- 1 ' 2 ^' 2 + n 1 / V+ 1 / 2 ). 
Hence, for any J > 2 such that n<5 3 — )• oo and n5 2 +1 — > 0, 

^/ i/2 (5)(^ j )-^)^iv(o,^). 

This result shows that, when 5 vanishes to zero, in order to guarantee the 
AMLE has the same asymptotic distribution as full MLE, we need to pick 
the approximation order J > 2, while maintaining n<5 3 — > oo and n5 2 +1 — > 0. 
The following theorem summarizes the asymptotic normality under both 
asymptotic regimes. 

Theorem 3. Under conditions (A.1)-(A.7) given in the Appendix, 

V^I 1/2 (5)(6^-6 )AN(0,E d ) 

for: (i) <5e(0,AaA] being fixed, n — )■ oo, J — > oo but n5 2J+2 — > or (ii) J > 
2 fremg fixed, n — > oo ; <5 — >• 6iti ra<5 3 — >• oo and n5 2 +1 — > 0. 

5.3. Asymptotic bias and variance. The remainder of this section is de- 
voted to the consideration of the asymptotic bias and variance of the AMLE 
under the two asymptotic regimes. Given our analysis in the early part of 
this section, our consideration will be focused on the situations where the 
asymptotic normality of the AMLE can be assumed, namely under: (i) 6 
being fixed, J — > oo, n — > oo but n8 2J+2 — > or (ii) J > 2 being fixed, 5 — > 0, 
n5 3 -)• oo but n5 2J+1 -)• 0. 

In the case of 5 being fixed and J — > oo, from (5.1) and provided n5 2 +2 — > 
0, we have 

9^ -9 = N~ l U n + N'\N n + F n )(9 n - O ) - N~\N n - N)N- l U n 

- N~\N n - N)N-\N n + F n ){9 n - ) 

- \N- l {E d ® [N- l U n + N' 1 {N n + F n )(8 n - 9 )]'} 
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x M n \N~ l U n + N~\N n + F n )(9 n - e )} 
+ \N~ l [E d ® (4 - 9 o )']T n n - 6 ) + O p (n^' 2 ) 
= N- l U n + [E d - N~\N n - N))N~ l (N n + F n ){6 n - 9 ) 
+ O p (n- 1 /V+ 1 ) + O p (n- 3 / 2 ). 

Then, the leading order bias of 9 n is 

(5.6) B(9 ,J,5) = N- 1 U + E{[E d -N~ 1 (N n -N)}N~ 1 (N n + F n )(e n -e )}, 
and the leading order variance is 

(5.7) V(9 , J, 5) = JNT" 1 J(J) Var^I^AT" 1 . 

In the case of J > 2 being fixed, 5 — > and n<5 3 — > oo but n(5 2u?+1 — > 0, 
it can be shown by a similar argument to that for the fixed <5 case above, 
the asymptotic bias and variance have the same forms as (5.6) and (5.7), 
respectively. Both (5.6) and (5.7) will be used to calibrate with the simulated 
bias and variance in the simulation study in Section 7. For J = 1 and 5 — > 0, 
there are difficulties in obtaining an expression for the bias of the AMLE in 
general due to the same dilemma in controlling the reminder terms and the 
quadratic term S n as outlined in Section 5.2. 

6. Approximating Fisher information matrix. We demonstrate in this 
section that the approximation of the transition density provides a way to 
approximate the Fisher information matrix. Fisher information matrix I(<5) 
is a key quantity associated with inference based on the full MLE. It defines 
the asymptotic efficiency and convergence rate. From Proposition 2, a nat- 
ural candidate to approximate 1(5) is —N(0q, J, 5) based on the J-term ex- 
pansion. To simplify our expedition, our consideration here is focused under 
the following diffusion process: 

(6.1) dX t = fi(X t - V )dt + a(X t ;OdB t , 

where r) = (rji, . . . , r)^ )' and £ = (£i , . . . , ^ 2 )' are distinct drift and diffusion 
parameters, respectively. The whole parameter = (t/>£')'- Here, we provide 
an explicit expression N(6o,l,5) based on the one-term density expansion. 
Expressions for higher J values may be made via more extensive derivations. 
Let in, fiij and so on denote partial derivatives with respect to rji, rji 
and rjj , respectively; and Oi and a x j and so on denote partial derivatives with 
respect to £j, and x and £j, respectively. By the one-term (J = 1) transition 
density approximation, derivations given in Chang and Chen (2011) show 
that 
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and 



d 2 log/« 



J =: -2E((j- 2 ^a i ) + 5 • N$ + 0(5 2 ), 



d€td£j 

where 
Nil' = ^{-v~ 2 HiHj ~ H°~ 2 Hij + a~ x p,^n x - \nxij], 
N{ 2 > = E{2/icr~ 3 ^cjj - a~ 2 [iia x aj + a^ 1 fi i a xj }, 

( 1 I r I 'i O Q O 

^2 = E{— 6/U cr~ cJjCJj + 1 6//<7 cr-rCTjCTj + 2/i a (Tij — 3a~ [i x OiOj 
- ^a~ 2 a 1 x a i aj - \iio~ 2 a x Gij - 5fia~ 2 a xi aj - 5fj,a~ 2 a xj ai 

+ &~ HxCij + 4CT~ G xx OiOj + -g-CJ - <J x O x iGj + -ycr" a x (J x j(Ji 
-\- 2& ®x®ij ' 2^^ 0~xij ~£&xxG%j ~2@xi@xj ~2~0~xO~xij 

&xxi®j 0~xxjO~i ~r ^G&xxij J • 

Thus 

(6.2) iV(0 o ,M) = £, 12 +0(5 2 ). 

We learn from Proposition 2 that —N(6q, 1, <5) provides a leading order ap- 
proximation to 1(5) with a reminder term at the order of 5 2 . Equation (6.2) 
confirms that as 5 — > 0, given the asymptotic normality of the full MLE 9 n as 
conveyed by Proposition 6, that the convergence rate of the full MLE for the 
drift parameters r\ is (nS) ' whereas that for the diffusion parameters £ 
is n ' 2 , faster than the drift parameter estimator. Our study confirms the 
results of Gobet (2002), S0rensen (2007) and Tang and Chen (2009). 

In the rest of the section, we will derive the Fisher information matrix 
approximation for two specific diffusion processes. Both are widely employed 
in modeling of the interest rate dynamics. 

6.1. Vasicek model. Consider the Vasicek (1977) model, 

(6.3) dX t = «(a - X t ) dt + a dB t , 

which is also the Ornstein-Uhlenbeck process. The conditional distribution 
of Xt given X t -\ is 

X t \X t ^ ~ N{X t ^e^ 5 + q(1 - e~ K5 ), \a 2 K~\l - e~ 2K<5 )}, 

and the stationary distribution of {Xt} is N(a, &-)• It yields that the infor- 
mation matrix of = (K,a,a)' is 1(5) = (Iij)3x3 where 



1 5[k5 + n5e 2KS - 2e 2KS + 2] 5 
2^ + /e(e 2 »* - l) 2 ~Yk 



hi = 77-0 + -7^5 TVo " = 77" + 0{5% I 12 = h 
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(1 + 2k5) - e 2nS 5 
/l3 - /31 - ™(e**-l) " a + 


0(5 2 ), 






_ 2K(e«6 - I) 2 _ K 2 5 2 


^23 = ^32 = 


and I 33 = 


2 


722 -<?{<** -I)' a 2 +U[ h 


= ^2 


These mean that 








(5-{2ny l 
(6.4) 1(5) = 5 ■ k 2 c 
\ -5-cj- 1 


- 2 

2(7-2 / 


+ 0(5 2 ). 





Hence 7(0) = lim^o I{$) is singular, an issue we have raised earlier, which 
makes us assume that 5I~ 1 (5) , s largest eigenvalue is bounded in condi- 
tion (A.5). 

Using the approximation formula in (6.2), we have 

/-5-(2k)- 1 5-a^\ 

N(8,l,5)=[ -5-k 2 g~ 2 \+0(5 2 ). 

\ 5 -a- 1 -2(7-2/ 

This means the leading order term of —N(8,l,5) is identical with that of 
the true Fisher information matrix in (6.4). 

6.2. Cox-Ingersoll-Ross model. Consider the Cox-Ingersoll-Ross (CIR) 
model [Cox, Ingersoll and Ross (1985)], 

(6.5) dX t = K(a-X t )dt + ayfx' t dB t , 

which is also Feller's (1952) square root process. 

Let = (k, a, a)' and c = 4kg" -2 (l — e — ) — 1 . The conditional distribution 
of cXt given Xt—i is 

cXtlXt.i-xKX), 

where the distribution is a noncentral \ 2 distribution with degree of free- 
dom v = 4Kacr-2 and noncentral parameter A = cXt~\e~ K . The transition 
density of X t given X t -i is 

/ \ 5/2 
f{Xt\Xt-i, 5- 9) = C -e~ u ~ v (-) I q (2V^), 

where u = cXt-\e~ /2, v = cX t /2, q = 2na/a 2 — 1 > 0, and I q is the mod- 
ified Bessel function of the first kind of order q. If 2na > a 2 , then the sta- 
tionary distribution of {X t } is T(^, gj). 

Although the second partial derivations of the log transition density func- 
tion can be derived after some labor that is involved with differentiating 
the modified Bessel function of the first kind, acquiring an expression for 
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the Fisher information matrix is a rather hard task, largely due to the dif- 
ficulty in deriving the expectations. In contrast, using the approximation 
formula (6.2), we can obtain the approximation for the opposite Fisher in- 
formation matrix, 

fNn N 12 N 13 \ 
N(9 ,l,5) = N 21 N 22 N 23 + 0(5 2 ), 
\N 3l N 32 N 33 l 



where 



. a 2 a 2 — 2kq 2 + aa 2 
N n = 6- 



N 2 = N 2 i =5- 



JVl3 = Mll =-*• 



a 4 - 2k<mt 2 

Anaa 2 — <r 4 — 8k 2 a + Ana 2 



2cr 4 - 4Acaa 2 
2na 2 (T 2 - An 2 a 2 + 2naa 2 



a 5 — 2Kaa 3 
N 22 = 5 ■ 



N 23 = -5 ■ 



a 2 — 2na 
2n 2 aa 2 - 4n 3 a + 2k 2 a 2 



2Kao~ [j 



and 



N 33 = -t. + § . <24,K 2 a 2 a 2 - 48n 3 a 2 + 48k 2 aa 2 



(7- 

- 2AKaa* + 36kct 4 + 4a 5 + 9ct 6 )(4ct 6 - 8KtWT 4 ) _1 . 

Using —N(6q,1,5), we can get the approximation of the Fisher informa- 
tion matrix. This approximation may be used in carrying out statistical 
inference on the CIR processes. 

6.3. Observed Fisher information. The major application for the asymp- 
totic normality of both the full and approximate MLEs is for statistical in- 
ference of 9, which include confidence regions and testing hypotheses for 9. 
For such purposes, the Fisher information 1(5) needs to be estimated. A nat- 
ural candidate would be —N n (9n ,J,5). Although it converges to 1(5) at the 
rate of O p {(n5)~ 1 ' 2 + 5 J } or O p {(n£) -1 / 2 + 5 J+1 } , depending on whether 6 is 
fixed or diminishing, —N n (9n ,J,5) may not be nonnegative definite, which 

can hinder the acquisition of {— N n (9 n , J, 5)} 1 ' 2 . To get around this issue, 
by noticing that 1(5) is the variance of the likelihood score, we consider 

1 n 

I n (9,J,5) = -y2lV e logf ( - J \X t \X t ^,5;9)][V e \ogf( J \X t \X t _ 1 ,5-9)}' 
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as an estimator of 1(5). The following theorem shows this by replacing 1(5) 
with I n (9n ,J,5) in Theorem 3. 

Theorem 4. Under conditions (A.1)-(A.7) given in the Appendix, 

v^ WU 5)(§W - e Q ) A N(0, E d ) 

for: (i) 5 G (0, AAA] being fixed, n— > oo, J — > oo butn5 2J+2 ->■ or (ii) J > 2 
6ein^ /ixerf, n — >• oo, 5 — > ou£ no" 3 — >• oo and no" +1 — > 0. 

Confidence regions and testing hypotheses can be readily carried out by 
utilizing the above results. 

7. Simulation. We report results from simulation studies which are de- 
signed to confirm the theoretical findings on the AMLE as reported in the 
earlier sections. To allow verification with the full MLE, we considered 
the Vasicek and CIR diffusion models reported in the previous section as 
both models permit the full MLE. The two asymptotic regimes were exper- 
imented: the fixed 5 and the diminishing 5 with no" 3 — > oo. 

The first part of the simulation is about the case in which 5 is fixed. 
The parameters used in the simulated Vasicek and CIR models were 9 = 
(K,a,a)' = (0.858,0.0891,0.0468)' and 9 = (K,a,a)' = (0.892,0.09,0.1817)', 
respectively. The sampling interval 5 was 1/12 and 1/4, and the order of the 
density approximation J was 1 and 2, respectively. For each 5 and J, the 
sample size n was set at 500, 1,000 and 2,000, respectively. In addition to 
bias and standard deviation, we consider 



RMSD(n, J,5) = \/E\\9 { n J) - 9 n \\ 2 2 , 

the square root of the expected square of modulated deviations between Q n 
and 9 n , as an overall performance measure. 

Tables 2 and 3 summarize the simulation for the fixed 5 case. They re- 
port the average bias and standard deviation (SD) for the full MLE and 
AMLEs with J = 1 and J = 2, as well as the RMSD between the AMLEs 
and the full MLE, for both the Vasicek and the CIR models. To give the 
simulation results more perspective and to confirm the derived approximate 
bias and variance formulas in Section 5, we also computed the asymptotic 
bias and standard deviation based on formulas (5.6) and (5.7). We observe 
from Tables 2 and 3 that at each 5 (1/12 and 1/4) experimented, the bias 
and the standard deviation of all the estimators for the three parameters 
became smaller as n increased. These confirmed the consistency of the es- 
timators. The tables also showed that there was a good agreement among 
the three estimators in terms of the performance measures. It appeared that 
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Table 2 

Simulated average bias (Bias) and standard deviations (SD) of the full MLE and two 

AMLEs with J = 1 and 2 for Vasicek model (k = 0.858, a = 0.0891, a = 0.0468;,- A. Bias 

and A.SD are asymptotic bias and SD based on formulas (5.6) and (5.7); RMSD is the 

root of mean square deviation between 6 n and 8„ 





cs 




(5 = 1/12 






5 = 1/4 




n Statist] 


MLE 


J = l 


J = 2 


MLE 


J=l 


J = 2 


500 Bias 


H 


0.0992 


0.0896 


0.0992 


0.0380 


0.0127 


0.0396 




a 


0.0002 


0.0002 


0.0002 


4.09e-5 


5.63e-5 


4.17e-5 




a 


4.39e-5 


4.14e-5 


4.39e-5 


9.12e-5 


7.13e-5 


9.43e-5 


A.Bias 


H 




0.0908 


0.1016 




0.0174 


0.0376 









0.0003 


0.0002 




0.0002 


0.0001 




(J 




4.55e-5 


4.55e-5 




0.0001 


0.0001 


SD 


H 


0.2307 


0.2255 


0.2309 


0.1366 


0.1290 


0.1386 




(X 


0.0085 


0.0085 


0.0085 


0.0050 


0.0050 


0.0050 




(J 


0.0016 


0.0016 


0.0016 


0.0016 


0.0016 


0.0016 


A.SD 


H 




0.2251 


0.2366 




0.1215 


0.1403 









0.0084 


0.0085 




0.0047 


0.0050 




a 




0.0016 


0.0016 




0.0016 


0.0016 


RMSD 


K 




0.0173 


0.0062 




0.0332 


0.0316 




Q 




0.0002 


1.28e-5 




0.0005 


0.0002 




a 




1.36e-5 


1.05e-5 




0.0001 


0.0001 


1,000 Bias 


K 


0.0518 


0.0419 


0.0520 


0.0170 


-0.0095 


0.0186 




a 


-0.0002 


-0.0002 


-0.0002 


1.83e-5 


2.81e-5 


1.58e-5 




a 


7.05e-5 


6.68e-5 


7.06e-5 


3.66e-5 


6.83e-6 


3.96e-5 


A.Bias 


H 




0.0446 


0.0529 




-0.0097 


0.0161 




o 




-0.0001 


-0.0002 




1.69e-5 


1.45e-5 




IT 




0.0001 


0.0001 




3.29e-5 


4.55e-5 


SD 


H 


0.1624 


0.1586 


0.1625 


0.0957 


0.0905 


0.0966 







0.0058 


0.0058 


0.0058 


0.0034 


0.0034 


0.0034 




a 


0.0011 


0.0011 


0.0011 


0.0012 


0.0012 


0.0012 


A.SD 


H 




0.1585 


0.1666 




0.0849 


0.0982 









0.0057 


0.0058 




0.0032 


0.0034 




a 




0.0011 


0.0011 




0.0012 


0.0012 


RMSD 


K 




0.0100 


0.0008 




0.0316 


0.0063 




a 




0.0001 


9.14e-6 




0.0004 


0.0001 




a 




7.39e-6 


7.80e-7 




0.0001 


1.59e-5 



the bias and the variance of the AMLE with J = 1 and J = 2 were quite 
comparable to each other. However, by comparing RMSD, it was clear that 
in most of the cases (except for n = 500 of CIR model) , the RMSD for J = 2 
was smaller than J = 1 , signaling the AMLE with J = 2 was closer to the 
full MLE than that of the AMLE with J = 1. This indicates that the AM- 
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Table 2 
(Continued) 





cs 




5 = 1/12 






6 = 1/ '4 




n Statist] 


MLE 


J = l 


J = 2 


MLE 


J = l 


J = 1 


2,000 Bias 


K 


0.0245 


0.0149 


0.0246 


0.0084 


-0.0191 


0.0100 




Q 


-3.97e-5 


-3.34e-5 


-4.01e-5 


-5.72e-5 


-4.90e-5 


-5.80e-5 




a 


2.69e-5 


2.30e-5 


2.70e-5 


4.00e-5 


9.21e-6 


4.34e-5 


A.Bias 


K 




0.0179 


0.0249 




-0.0085 


0.0071 




a 




-2.63e-5 


-2.98e-5 




0.0001 


-0.0001 




a 




4.55e-5 


4.55e-5 




4.55e-5 


4.55e-5 


SD 


K 


0.1114 


0.1091 


0.1115 


0.0647 


0.0611 


0.0652 




a 


0.0042 


0.0041 


0.0042 


0.0024 


0.0024 


0.0024 




a 


0.0008 


0.0008 


0.0008 


0.0008 


0.0008 


0.0008 


A.SD 


K 




0.1088 


0.1143 




0.0576 


0.0665 




a 




0.0041 


0.0042 




0.0023 


0.0024 




a 




0.0008 


0.0008 




0.0008 


0.0008 


RMSD 


K 




0.0100 


0.0006 




0.0300 


0.0042 




a 




0.0001 


7.37e-6 




0.0003 


4.68e-5 




a 




6.27e-6 


7.80e-7 




0.0001 


1.02e-5 



LEs with J = 2 were indeed closer to those with J = 1, as confirmed by our 
early analysis. The asymptotic bias and standard deviation predicted for 
the AMLE with J = 1 and 2 offer more insights, and show good agreement 
between the simulated results and the predicted values by the theory, which 
is very assuring. We also observe that for 5 = 1/4, the AMLE with J = 2 
performs better than AMLE with J = 1, which somehow reflects Table 1 
which shows that J = 2 is preferred to J = 1 at this frequency. When 5 was 
fixed at 1/12, we see the performance between J = 1 and J = 2 was largely 
similar. 

The second part of the simulation was devoted to the diminishing 5 case. 
Here we wanted to confirm the differential behavior of the AMLEs in the 
limiting distribution between J = 1 and J > 2, as revealed in Section 5. The 
Vasicek model with 9 = (n,a,a)' = (0.892,0.09,0.1817)' was considered. We 
tried to create two scenarios: (i) n5 3 — > oo and (ii) n<5 3 — > 0, while 5 — > 0. 
They were created by choosing 5 = n" 1 ' 6 and 5 = n~ ' 2 , respectively, whiling 
selecting n = 500, 1,000, 2,000,4,000 and 8,000, respectively, to create two 
streams of asymptotic sequences. For each n and 5, we generated repeatedly 
the Vasicek sample paths 1,000 times. For each simulated sample path, we 
obtained the AMLEs On' for J = 1 and 2, respectively, and computed the 
Wald statistics 

W n (J) = n(6^ - 8 )'I(5)(9^ - 9 ). 
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Table 3 

Simulated average bias (Bias) and standard deviations (SD) of the full MLE and two 

AMLEs with J = 1 and 2 for CIR model (k = 0.892, a = 0.09, a = 0.1817J; A. Bias and 

A.SD are asymptotic bias and SD based on formulas (5.6) and (5.7); RMSD is the root 

of mean square deviation between 8 n and 8„ 





:s 




.5 = 1/12 






<5 = l/4 




n Statistii 


MLE 


J = l 


J = 2 


MLE 


J = l 


J = 2 


500 Bias 


K 


0.0980 


0.0910 


0.0978 


0.0371 


0.0234 


0.0388 




a 


0.0001 


0.0004 


0.0001 


-6.38e-5 


0.0008 


-0.0001 




a 


0.0003 


0.0003 


0.0003 


0.0004 


0.0005 


0.0003 


A.Bias 


K 




0.0818 


0.0984 




0.0207 


0.0513 




a 




0.0005 


0.0001 




0.0008 


-0.0001 




a 




0.0003 


0.0003 




0.0004 


0.0002 


SD 


K 


0.2389 


0.2340 


0.2405 


0.1437 


0.1338 


0.2256 




a 


0.0093 


0.0093 


0.0093 


0.0055 


0.0054 


0.0055 




a 


0.0060 


0.0060 


0.0060 


0.0065 


0.0065 


0.0069 


A.SD 


K 




0.2169 


0.2389 




0.1159 


0.1938 




a 




0.0091 


0.0093 




0.0064 


0.0055 




17 




0.0060 


0.0060 




0.0067 


0.0065 


RMSD 


K 




0.0200 


0.0224 




0.0447 


0.1622 




a 




0.0009 


0.0004 




0.0018 


0.0004 




a 




0.0004 


0.0004 




0.0017 


0.0021 


1,000 Bias 


K 


0.0521 


0.0435 


0.0521 


0.0218 


0.0070 


0.0186 




a 


-1.54e-5 


0.0002 


-2.22e-5 


-0.0002 


0.0007 


-0.0003 




a 


3.86e-5 


4.35e-5 


3.81e-5 


0.0003 


0.0006 


0.0003 


A.Bias 


K 




0.0411 


0.0525 




0.0095 


0.0262 




a 




0.0004 


-3.43e-5 




0.0007 


-0.0003 




a 




3.17e 5 


2.69e-5 




0.0003 


0.0001 


SD 


K 


0.1596 


0.1558 


0.1603 


0.0968 


0.0861 


0.0980 




(X 


0.0067 


0.0067 


0.0067 


0.0039 


0.0037 


0.0039 




a 


0.0043 


0.0043 


0.0043 


0.0045 


0.0045 


0.0045 


A.SD 


K 




0.1452 


0.1596 




0.0823 


0.0969 




a 




0.0066 


0.0067 




0.0044 


0.0039 




a 




0.0040 


0.0043 




0.0047 


0.0045 


RMSD 


K 




0.0173 


0.0141 




0.0447 


0.0200 




(\ 




0.0003 


2.66e-5 




0.0020 


0.0001 




a 




0.0002 


3.91e-5 




0.0021 


0.0002 



If ^/nl 1 ' 2 (5)(6n — do) is asymptotically standard normally distributed 

in R d , then the Wald statistic W n (J) A x|. Based on the 1,000 Wald statis- 
tics from the simulations, we then performed the Kolmogorov-Smirnov (K-S) 
test to test Hq\ W n (J) ~ xi> or n °t> for each of the designed sequences 
of (n, 5) generated under the two scenarios. Table 4 reports the p-values of 
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Table 3 
(Continued) 





;s 




5 = 1/12 






(5 = 1/4 




n Statistic 


MLE 


J = l 


J = 2 


MLE 


J = 1 


J = 2 


2,000 Bias 


K 


0.0295 


0.0199 


0.0294 


0.0103 


-0.0057 


0.0069 




a 


-0.0002 


0.0001 


-0.0002 


-3.06e-5 


0.0010 


-9.87e-5 




a 


0.0002 


0.0002 


0.0002 


3.05e-5 


0.0006 


1.33e-5 


A.Bias 


K 




0.0213 


0.0299 




-0.0011 


0.0147 




a 




0.0002 


-0.0002 




0.0006 


-0.0001 




a 




0.0002 


0.0002 




0.0005 


1.06e-5 


SD 


K 


0.1082 


0.1053 


0.1088 


0.0696 


0.0607 


0.0698 




a 


0.0048 


0.0048 


0.0048 


0.0028 


0.0027 


0.0028 




a 


0.0030 


0.0031 


0.0030 


0.0033 


0.0037 


0.0033 


A.SD 


K 




0.1181 


0.1105 




0.0592 


0.0697 




Q 




0.0047 


0.0048 




0.0027 


0.0028 




a 




0.0030 


0.0030 




0.0034 


0.0033 


RMSD 


K 




0.0173 


0.0068 




0.0424 


0.0100 




a 




0.0004 


0.0001 




0.0020 


0.0001 




a 




0.0005 


0.0003 




0.0027 


0.0001 



the test, which show that for J = 1, under both scenarios, the p-values of 
the K-S test became smaller, and hence the above null hypothesis was re- 
jected as n increased. For J = 2, the p-values of the K-S test were sharply 
different between the two scenarios. In particular, the p-values were mostly 
quite large under the scenario of n5 s — > oo, and they were largely significant 
(small) when 5 was diminishing at the faster rate of ra" 1 ' 2 such that n5 3 — > 0. 
These were consistent with our theoretical findings in Section 5. 



Table 4 
p-values of Kolmogorov-Smirnov test for W n (J) • 



2 



Situation 


n 


s 


J=l 


J = 2 


8 = n-^ 


500 


0.3550 


0.3524 


0.0587 




1,000 


0.3162 


0.4595 


0.5830 




2,000 


0.2817 


0.1149 


0.2710 




4,000 


0.2510 


0.0019 


0.8309 




8,000 


0.2236 


5.74e-8 


0.6002 


S^n' 1 ' 2 


500 


0.0447 


5.04e-7 


2.45e-8 




1,000 


0.0316 


0.0003 


9.72e-5 




2,000 


0.0224 


0.0006 


0.0003 




4,000 


0.0158 


0.1109 


0.0851 




8,000 


0.0112 


0.0470 


0.0367 
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We need the following technical assumptions in our analysis. 

(A.l) (i) is a compact set in R , and the true parameter 6q is an interior 
point of O; (ii) for all values of the parameters 6, Assumption 1-3 in Ai't- 
Sahalia (2002) hold; (iii) the drift function fx(x; 6) is a bona fide function 
of 9 for each x. 

(A.2) (i) For every 5 > 0, EV e log/(A t |AVi,5;6> ) = 0, and O is the only 
root of EV e log f(X t \X t -i,5; 6) = 0. (ii) the MLE 9 n and the J-term approx- 
imate MLE 9n satisfy, respectively, 

n n 

J2Velogf(X t \X t ^i,6;§ n ) = and ^V e log^ J) (X t \X t - ll 8;§^) = 0. 



t=i 



t=i 



(A. 3) There exist finite positive constants A and K\ such that, for / 
1,2,3, any 5 G (0, A], ii, *2, *3 £ {1, . . . ,d} and j = 1 and 2, 



Esup 

6»G0 



tfAjiXtlXt-!^ 



<Ki. 



de n ---dd k 

(A. 4) There exist finite positive constants V[ for I = 0, 1,2 and 3, A > 
and K2 such that v$ > 3, 1/2 > v\ > 3, 1^3 > 1 and for any £1, . . . ,i% G {1, . . . , d} 

and <5g(0,A], 



E^sup 
eee 



£ 



z=o 



^Q(7(X t ;^)|7(X t _ i; 0);0) 



/! 



i'i ■ 



<K 2 . 



de n ---de iq 

(A. 5) For any 5 > 0, the Fisher information matrix 
I(S) := -EVJ e logf(X t \X t ^,S;9 ) 

is invertible and as 5 — > the largest eigenvalues of 5I~ 1 (d) is bounded away 
from infinity. 

(A. 6) For each positive integer K, which may be infinite, and any 5 G 

(0,A], 

r K a 

5>( 7 (X t ;0)| 7 (Xt-i;0);r 



inf 
6»ee 



z=o 



/! 



0^=0. 



(A. 7) For any /3 > 1 and 77 > 0, there exists A(/3,ry) > 0, then for any 
5 G (0, A(/3,r/)] and X, where K may be infinite, 



inf 
eee 



K 



5>(7(A t ;#)| 7 (A t _i;#); 



;=o 



/! 



< rj 1/l3 > <rj. 



Assumptions (A.l) and (A.2) are standard requirements for maximum 
likelihood estimators. In particular, (A.l) (ii) contains conditions on the 
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smoothness of the drift and the diffusion which ensures the existence of 
a unique solution to (2.1) as well as the infinite differentiability of the tran- 
sition density f(x\xo,S;0) with respect to x, xq and 5, and three times 
differentiable with respect to 9 [Friedman (1964)]. The second part of (A. 2) 
is the simplified approach of Cramer (1946) assuming the MLEs are the 
solutions of the likelihood score equations. Assumption (A. 3) is needed to 
guarantee the third derivative of logf(Xt\Xt—i,5;9) with respect to 6 can be 
controlled by an integrable function, while condition (A. 4) ensures the ab- 
solute convergence of the infinite series J2^ \ci( , y(Xt;6)\'y(Xt-i;9)\5 l /ll = 
exp{A^(x\xQ,5;6)} as Ait-Sahalia (2002) has provided conditions on the 
nondegeneracy of the diffusion function and the boundary condition, which 
together with the third part of condition (A.l) leads to the convergence of 
the above infinite series exp{A3(x\xo,6\0)}. Condition (A. 4) is also needed 
to allow exchange of differentiation and summation for the infinite series. 
The first part of the (A. 5) is of standard in likelihood inference. Its second 
part reflects the fact that for some processes \ims-yoI(5) may be singular, 
as conveyed in our discussion in Section 6 for the Vasicek process. Condi- 
tion (A. 6) is needed to guarantee the derivatives of log transition density 
and log approximate transition density exist with probability one. Condi- 
tion (A. 7) is needed to manage the denominators in the derivatives of the log 
approximate transition density, ensuring that the probability of their taking 
small values can be controlled uniformly. 

We shall give the proofs for the propositions and theorems mentioned in 
Sections 3-6. We first present some lemmas about the true transition density 
and its approximations, which we will use in later proofs. The proofs for the 
lemmas can be found in Chang and Chen (2011). 

Lemma 1. Under (A.l) and (A. 4), for any 5 G (0, A), the infinite series 

J>( 7 (A t ;0)|7(AV i; #))- 
i=o l - 

absolutely converges with probability 1, and for k = l,2 and 3, and ii,i2,i3 G 
{l,...,d}, 

J2ci(i(x t ;e)\ 7 (x t ^;e))°- 

1=0 L 

I 



d9 h ---de ik 



=E^-^- c,(7(x,;e)l7(x '- i;, ' )) «- 

i=o n k 

Lemma 2. Under (A. 6) and (A. 7), for any positive f3 > 1, there exist 
two constants m(/3) < oo and Ai(/3) > such that for any 5 £ (0, Ai(/3)] 



AMLE FOR DIFFUSION PROCESSES 



27 



and J, where J can be infinity, then 



E< sup 



£>(7(^;0)l7pQ-i;0)) 



1=0 



/! 



<m(/3). 



Lemma 3. Under (A.l), (A. 3), (A. 4), (A. 6), (A. 7), i/iere exist two con- 
stants M\ < oo and A 2 > such that, for any J, where J can be infinity, 
Se (0,A 2 ) and i,j,ke {l,...,d}, 

&)og fW{X t \Xt-x,8;0) 



E< sup 
Use 



d9 t d6j d9 k 



<Mi. 



Proof of Proposition 1. Using the same method in the proof of 
Lemma 3, we know (a) holds. On the other hand, Lemma 3 implies (b). □ 

Proof of Proposition 2. See the proof of Proposition 2 in Chang 
and Chen (2011). □ 

Proof of Proposition 3. Recall Proposition 2, then 
\\I- l (5)N(6 ,J,5)+E d \\ 2 < \\r\6)\\ 2 ■ \\N(6 ,J,8)+I(5)\\ 2 <C8 J . 
If C5 J <l, then 

WN-'(f) J S\T(S\ l F\\ < \\rH8)N(0o,J,8)+Ed\\2 
\\N (0 ,J,6)I(6) + E d \\2< 1 _ llI _ im{9Q ^ §) + Edh . 

From Proposition 2, if C5 +1 < 1, then 

, \i- 1 m 2 2\\N(e ,j,5) + i(8)\\ 2 



\N- 1 (9 ,J,5)+r 1 



< 



1 



i 2 ||j\r(0 o ,j,(y)+i((S)||2' 

On the other hand, using the same method in the proof of Proposition 2, 
we have 

\\U(e ,J,5)\\ 2 <C8 J+1 

for any positive J and 5 6 (0, A). Hence, we can find the constants C\,C 2 
and A > such that 

\\N- 1 (9 ,J,S)I(S)+E d \\ 2 <C 1 6 J and \\N-\9 , J,6)U(0 , J,5)\\ 2 < C 2 5 J 
for any positive integer J and 5 & (0, A). D 

Proof of Proposition 4. Use the same method in the proof of Propo- 
sition 2. □ 

Proof of Proposition 5. We'll use Corollary 2.1 in Newey (1991) to 
prove this proposition. We only need to verify three conditions under two 
situations mentioned in Proposition 5: 
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(i) for any i G {1, . . . , d} , 







E^ — — logf(X t \X t -i,5;9) }> is equicontinuous; 



(ii) for any i G {1, . . . ,d}, 

D 2 



sup 



o P (i); 



t=\ % 

(iii) for any i G {1, . . . ,d} and 9 G 8, 

1 ^ Q (Pi ^ 

t=i * ^ * J 



For any e*,^** G 9, note that 

<!^-log/(X t |AVi,<5;#* 



<! Aiog/(X t |X t _i,(5;r 



E 



c) 2 



-logf(X t \X t _ 1 ,S;9)\-(9*-9*% 



dOi 89' 

where 9 is on the joint line between 9* and 9**. Then 



<-^-logf(X t \X t ^,6;9* 



t-^-\ogf{Xt\Xt- U 6;0* 



< 



E 



d 2 



do, ae 



7 ]ogf(Xt\X t - U 6;6) 



For any j G {1, . . . ,d}, use the same method in the proof of Lemma 3, we 
know that there exists a constant C, which is not dependent on J and 5, 
and A > such that, for any J and 5 G (0, A], 

■]ogf(X t \Xt-i,5;e) 



Eisup 
Uee 



d9 l 89 j 



<C. 



Hence, (i) and (ii) can be established. 

To verify (iii), from (A. 3) [Lemmas 3 and 4 in Ait-Sahalia and Mykland 
(2004)], we know that there exists a positive constant k such that for any 
h <t 2 , 



E 



— logf(X tl \X tl _ 1 ,5;9)-El—logf(X tl \X tl _ u 5;9 

— log/(X t2 |X t2 _ 1)( 5;0)-E| — log/(X t2 |X t2 _i,<5; 

< C ■ exp{— n{t2 — ti)5}, 
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where 

C = E 
Then 



n 



t=i 



— log/(X t |X t _i,<5;e)-EJ — log f{X t \X t ^8;9) 



— \ogf(x t \x t ^5-e)-E\—\o g f{x t \x t ^8-e) 



C C exp{-/«(J} 



<3 



n n 1 — exp{— n5} 
2v x 



2Ki + K 2 -m 



i/i-2 



1 



+ 



1 



n ra[exp(fi;<5) — 1] 



0, 



under the two situations mentioned in the statement of Proposition 5. Hence 
we complete the proof. □ 

Proof OF Proposition 6. From (A. 2), we can get n _1 V^j(S n ) = 0. 
Expanding it at 9q, 

-, n i n 

= ~Y / Velogf(X t \X t _ 1 ,6;Oo) + -Y t VJ 9 logf(X t \X t . ll S;e) • (9 n -9 Q ). 



t=\ 



t=\ 



Then 



'n — VQ 



\--J2y 2 ee^gf(X t \X t ^,5;e)\ --f^ V 6 logf(X t 



Xt-i,8;9o). 



Define I n (5) = —n z2t=i^ee^°Sf(^t\Xt-i,5;9o). From Lemma 3, an 

-™~ X x E^i VJ e logf(X t \X t -i,6;9) = I n (S) ■ {1 + o p (l)}. Using the same 
way as that in the verification of (iii) in the proof of Proposition 5, we can 
get I n (S) - 1(5) = O p {{n5)- l l 2 }. If n5 3 -»• oo, by (A.5), 

--Y,V 2 ee logf(X t \X t ^,5;9)\ ={I(5)-{l + o p (l)}+O p {(n5r 1 / 2 }}- 1 



n 



t=i 



r l (S)-{i + o p (i)}. 



Then 



n 

V^I 1/2 (d)(9 n -9 ) = r 1 / 2 (5)-^Y, V ^ogf(X t \X t ^ 1 ,5;9 )-{l + o p (l)}. 



t=\ 



We will use the martingale central limit theorem [Billingsley (1995), page 476] 
to show that the first part on the right-hand side of the above equation con- 
verges to a standard normal distribution. For any a£l d with unit L 2 norm, 
to simplify notations, let U nm = a'I~ 1 > 2 (5)n~ 1 < 2 VQlogf(X m \X m _±,5;9o) 
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and & n ,m = cr(Xi, . . . ,X m ). It is easy to check (JJ n ,mi ^n,Tn) is a martinga- 
le difference array. By the Markov property and Birkhoff's Ergodic theo- 
rem, K,n = Em=i E ( t7 n,ml^ l> m) 4 EC/£ m = 1. On the other hand, 
Ylm=i l^n,m| 3 < C(n x (5 3 ) -1 ' 2 — > 0. This implies the asymptotic normality 
of y/na'I 1 ' 2 {5)(6 n — 9q). Then we complete the proof. □ 

PROOF of Theorem 1. From Propositions 4 and 5, we can get 

||EV 9 log/(X t |AVi,Mi J) )|| 2 40 

for either: (i) 5 G (0, A A A] being fixed, J — > oo and n — > oo, or (ii) J being 
fixed, n —> oo, 6 — > but nb — > oo. Hence, noting condition (A.2)(i), we have 
the consistency of the AMLE 9 n J) '. □ 

Proof of Theorem 2. For fixed 5, from Theorem 1 and (4.1), we know 
that the leading order term of n — #o contains two parts: one is N~ 1 U n , and 
the other is N' 1 (N n + F n ) n - 9 ). Hence, 9 { n J) -9 = O p {5 J+1 + (nd)" 1 ! 2 }. 

For J fixed and 5 — > 0, Proposition 4 implies 



E 



( n n >| 

) -Tv e \ogf(X t \X t ^ 1 ,5;8^)--y2Velogf(X t \X t _ l ,5;e n ) } 



<C\5 J+1 . 
This means that 



E< 



- y^v 2 e6 \ogf{x t \x t ^8-e) ■ (§&-§„ 



t=l 



<C5 



J+i 



(J) 



where 9 is on the joining line between On and 9 n . Hence 

-J2v 2 eo ]ogf(X t \X t . u Si9) ■ (9^-9 n ) = O p (5 J+1 ). 



t=i 



Since 9 — > 9q and 9n 



o P (l), 



1 n 

-Y,V 2 9elogf(X t \X t -i,6;0 ) ■ (§^-9 n ) = O p (5 J+1 ). 

4=1 

On the other hand, from Proposition 2, we know 

1 n i n 

-Y,^ 2 ee^gf(X t \X t ^5-9 Q ) - -Y,^le^gf {J) {X t \X t . 1 ,5;9 ) 



t=i 



t=\ 



o P (5 j+i : 
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Then N n (9n — 9 n ) = O p (5 +l ). Using the same way of verifying (iii) in the 
proof of Proposition 5, we know N n — N = O p {(n<5) -1 / 2 }. As n<5 3 — > oo, then 
N{9 n J) - 9 n ) = O p {5 J+l ). Hence, 9 { n J) -9 n = O p {5 J ). At the same time, we 
know 9 n -9 = OpKnS)- 1 / 2 }. Then 

9^-e a = O p {5 J + {n5)- 1 ' 2 }. 
This completes the proof of Theorem 2. □ 

Proof of Theorem 4. We only need to prove following result: 

v^ WU 5){e^ - e ) = V^i 1/2 me { n J) - e ) + o P (i) 

under the two situations mentioned in Theorem 4. Using the approach in 

the proof of Lemma 3, we have l n 0n ,J,S) — I n (8o,J,5) = O p {\\9n — ^olb}- 
Also, using the same way of verifying (iii) in the proof of Proposition 5, 

I n (9o,J,5) — KI n (9o,J,S) = OpKnJ)" 1 ' 2 }. By the same argument in the 
proof of Proposition 2, EI n (6 ,J,S) - 1(5) = 0(5 J+1 ). Hence, if n5 3 -> oo, 
under either asymptotic regime in Theorem 4, 

I l J 2 (9 { n J \j,5) = I l /\5).{l + o p (l)}. 

Then we complete the proof. □ 
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